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1 Introduction 



Double exchange model |l| has been widely studied as a model for itinerant 
ferromagnetism, which is one of the most interesting subject in the field of 
strong electron correlations. The model was introduced by Zener in 1951 
as a canonical model for perovskite manganites 0, where e g and t2 g elec- 
trons of manganese 3d bands are treated as itinerant electrons and localized 
spins, respectively. Interactions between these two species of electrons are 
taken into account through on-site Hund's couplings which are stronger than 
electron hopping energies. The model has been investigated extensively for 
half a century, especially in connection with the colossal magnetoresistance 
phenomena in manganites. 

Although the model is simple in its form, thermodynamic properties have 
not been well known so far. In general, it is difficult to investigate strongly 
correlated electron systems, both in analytical and numerical methods. Es- 
pecially in this model, effects of thermal fluctuations around the critical 
temperature are quite strong so that perturbational approaches as well as 
mean-field methods do not work in controlled manners. 

Recently, dynamical mean-field methods as well as Monte Carlo (MC) 
calculations on small lattice clusters are performed [|IJ. These results are 
more reliable than simple mean-field theories in the sense that they partially 
take into account the thermal fluctuations. Nevertheless, the former method 
completely neglects spatial fluctuations while the latter suffers from finite 
size errors due to the loss of long wavelength fluctuations. 

In order to study thermodynamic properties of the model, especially crit- 
ical phenomena and their influences to electron conductions, it is necessary 
to perform calculations which properly take into account fluctuation effects. 
MC calculation on a large size cluster seems to be a promising method, pro- 
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vided finite-size extrapolation are treated properly. 

In this paper, we study an algorithm for a MC calculation of the double 
exchange model which improves CPU-time consumption significantly. Us- 
ing this algorithm, calculations for large size systems become easier, which 
enables us to perform extrapolations to the thermodynamic limit as well as 
finite-size scalings. 

2 Monte Carlo algorithm 

The double exchange model is defined by 

H({Si}) = -t ]T c\ a c ja - JnJ2°i ' (1) 

<i,j>cr i 

where c and represent operators for itinerant electrons while S is the local- 
ized classical spins. On-site Hund's coupling Jh gives the interaction energy 
between itinerant electrons and localized spins. For a given and fixed spin 
configuration {Si}, the Hamiltonian is equivalent to a single-body electron 
system interacting with random magnetic fields. Properties of the system at 
finite temperatures are obtained through the thermodynamic average over 
the configuration {Si}. 

Boltzmann weight for a spin configuration {Si} is given by 

P({Si}) = Tr exp[-/3(W({^}) - fxN)} =l[(l + exp[-/3(e m - //)]) , (2) 

m 

where Tr represents a grand canonical trace over fermion degrees of free- 
dom. Here, e m represent eigenvalues for H,({Si}). MC sampling of the spin 
configuration {Si} with the probability density P({Si}) gives stochastical 
estimates for thermodynamical properties of the system. Direct evaluation 
of eq. (0) requires all the eigenvalues of TC({Si}). To obtain all of them 
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through a full diagonalization of the Hamiltonian matrix, it is necessary to 
make a calculation of 0(N^ im ) where A^ji m is the Hilbert space dimension of 
the Hamiltonian. 

Alternatively, P({Si}) can be obtained using the density of states (DOS) 
[[J. Equation (|2|) is rewritten as 

logP({5,}) = J deD{e) log(l + exp[-/3(e - //)]) , (3) 

where D(e) is the DOS which depends on the spin configuration {Si}. Equa- 
tion (^[) is calculated efficiently by the moment expansion algorithm j|, [J. 
For a given function f(e), we perform a Chebyshev polynomial expansion 

/ deD{e)f(e) = /i /o + 2 £ fi n f n , (4) 

n>l 

fn= I 2 T n (e)f(e), (5) 

fi n = TrT n (H) = Y,{v\Tn{K)\p), (6) 

v 

where T n is the n-th Chebyshev polynomial defined by T (e) = 1, Ti(e) = e, 
and T m+ i(e) = 2eT m (e)— T m _i(e). {|^)} is a complete set of kets. We assume 
here that, in order to ensure the expansion, the Hamiltonian is normalized 
properly to satisfy \e m \ < 1. Moments for the DOS calculated for each 

given spin configuration {Si}, while /„ are fixed throughout the MC run. 
In practice, eq. (||) is calculated as follows. We define 

\p-m)=T m {H)\v). (7) 

Using the recursion formula for the Chebyshev polynomials, we have 

\v] m) = 27i\v\ m — 1) — \i>] m — 2). (8) 



4 



From the multiplication formula T m+n = 2T m T n — T m _ n , moments for the 
DOS are calculated as 

V2m = {iy\Tn\v\m) - l) , (9) 
H2m+i = ((^;m\u;m + 1) - (u;0\u; 1)) . (10) 

As we see from above, calculation of P({Si}) based on the moment expan- 
sion algorithm costs the CPU time of 0(N% im ), i.e. the sparse matrix product 
~ O(A^im) times the trace over the complete set ~ 0(iVdi m ), provided the 
electron hopping is short ranged. Therefore, when the lattice size is large, 
this algorithm provides us a faster calculation than the algorithm based on 
matrix diagonalization with 0(iVf im ). Moreover, for each u, eqs. (f7l)-(|l0l) can 
be calculated independently. Thus this algorithm is optimized for parallel 
computations. In contrast, the previous algorithm based on diagonalization 
of matrices is known to be inefficient for parallelizations. 

This algorithm is applicable to wide classes of strongly correlated elec- 
tron systems where itinerant electrons are coupled with thermally fluctuating 
fields {<pi}. If one considers {<pi\ as a classical field, thermodynamic aver- 
ages over {cpi} are calculated by a MC run. P({0j}) are calculated by the 
moment expansion algorithm. It is justified to treat {<p,i} as a classical field 
at finite temperature, at least near the renormalized classical critical points. 
It is in general interesting to survey such a region, since the effects of critical 
fluctuations to the conduction electrons are highly non-trivial. 

3 Benchmark Results and Comments 

Numerical calculations are performed on Aoyama Plus systems @, Beowulf- 
type clusters of commodity personal computers connected by 100Base- Tx 
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Fast Ethernet. Benchmarks are taken on two of the cluster systems, (i) 
69 node cluster of dual Pentium II 350MHz with 384MB memory (total 
138 processors and 26GB memory) and (ii) 11 node cluster of dual Celeron 
533MHz with 256MB memory (total 22 processors and 2.8GB memory). We 
use MPI for parallel computations. Within a node, shared memory SMP 
communications based on OpenMP interfaces are used. 



Lattice size 


Previous algorithm 


Present algorithm 


6x6x6 


200 days 


18 hours 


8x8x8 


16 years 


10 days 



Table 1: Benchmark result on Pentium II 350MHz computer, for the previous 
(full diagonalization) algorithm calculated on single node, and the present 
(moment expansion) algorithm on 64 parallel nodes. CPU time for 10,000 
MC steps are shown. 

In Table 1 we show the benchmark results for a comparison between the 
full diagonalization algorithm and the moment expansion algorithm. We take 
10,000 MC steps which is typically a minimum number necessary for accu- 
rate calculations. Calculations based on the full diagonalization algorithm 
are performed on one node since it is difficult to make an efficient parallel 
computation. 64 CPUs are used in parallel for the moment expansion algo- 
rithm. As a result, we see a large improvement of the computational speed by 
the moment expansion algorithm. MC calculation on a 8 3 lattice, which has 
been virtually impossible by the full diagonalization algorithm, now turns 
out to be within our reach by the moment expansion algorithm. Using the 
commodity PC clusters with over 100 CPUs, it is practical to perform a MC 
calculation for a system with A^i m ~ 10 3 . 
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Thus it is now feasible to investigate finite size clusters with systematic 
series of lattice sizes. Extrapolations to thermodynamic limits as well as 
finite-size scalings for various thermodynamical properties will be reported 
elsewhere [|7[ |J. 
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Figure 1: Multiple node benchmark result of the moment expansion algo- 
rithm for the double exchange model on a 32 x 32 lattice. Calculations are 
performed on dual Pentiumll 350MHz cluster system (filled square) and dual 
Celeron 533MHz cluster system (open square) as well as on SGI2800 (filled 
circle in grey) as a comparison. 

Benchmark results for the parallel calculation efficiency are shown in 
Fig. 1. Computational speeds are indexed by numbers of moments calcu- 
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lated per second. We show the results for (i) a Pentiumll system, (ii) a 
Celeron system, as well as (iii) SGI2800 for comparison. We see that in ei- 
ther system the computational speed scales almost linearly with the number 
of nodes N node . This indicates that the efficiency of the parallel computation 
is quite high at N nodc <S 10 2 . The benchmark result also shows that the 
program runs equivalently or even faster on commodity personal computer 
systems compared to the high-performance (and high-cost) parallel compu- 
tational system. 

The moment expansion algorithm is optimized for parallel computations, 
especially on Beowulf-type commodity systems, in the following sense, (i) 
The most CPU time consuming part of the MC run is the calculation of 
P({Si}), which is completely parallelized. Data transfer occurs only once for 
each calculation of P({Si}), while the CPU time for it scales as 0(N dim /N node ). 
Thus, for sufficiently large systems, data transfer time is negligibly small 
compared to calculation time. High efficiency of the parallel computation 
is well understood by Amdahl's law. (ii) In general, MC calculations con- 
sume small amount of memory. In the present case with iVdim ~ 10 3 , less 
than lOOkB of memory is used for the vectors \v;m) as well as for the ma- 
trix Ti,, which do not overflow from the Level2 (L2) cache of commodity 
CPUs. Computational speed scales almost linearly with CPU clock speed, 
(iii) Communications among CPUs as well as access to main memories are, 
in general, the bottlenecks in the usage of the commodity-type clusters. In 
our algorithm, these two features conceal the disadvantages. 

Since the increase of the CPU clock speed for commodity computers is 
quite large nowadays, it is now getting easier and easier to construct a com- 
modity computer systems which exhibits a high price-performance for the 
moment expansion calculations. 

Let us finally note that the moment expansion algorithm is applicable 
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to many kinds of strongly correlated electron systems. We have obtained a 
powerful algorithm to investigate thermodynamic properties of various elec- 
tronic models which have not yet been studied numerically in a systematic 
way. 
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